Applying WebTables in Practice

نویسندگان

Sreeram Balakrishnan

Alon Y. Halevy

Boulos Harb

Hongrae Lee

Jayant Madhavan

Afshin Rostamizadeh

Warren Shen

Kenneth Wilder

Fei Wu

Cong Yu

چکیده

We started investigating the collection of HTML tables on the Web and developed the WebTables system a few years ago [4]. Since then, our work has been motivated by applying WebTables in a broad set of applications at Google, resulting in several product launches. In this paper, we describe the challenges faced, lessons learned, and new insights that we gained from our efforts. The main challenges we faced in our efforts were (1) identifying tables that are likely to contain high-quality data (as opposed to tables used for navigation, layout, or formatting), and (2) recovering the semantics of these tables or signals that hint at their semantics. The result is a semantically enriched table corpus that we used to develop several services. First, we created a search engine for structured data whose index includes over a hundred million HTML tables. Second, we enabled users of Google Docs (through its Research Panel) to find relevant data tables and to insert such data into their documents as needed. Most recently, we brought WebTables to a much broader audience by using the table corpus to provide richer tabular snippets for fact-seeking web search queries on Google.com.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Surveillance and monitoring: a vital investment for the changing burdens of disease.

ed from webtables. EDITORIAL 1141

متن کامل

Evidence based medicine in nuclear medicine practice; Part II: Appraising and applying the evidence

As described in the first part of this article, Evidence Based Medicine (EBM) is a growing part of medical practice which emphasizes on the best evidence. Finding this evidence by formulating an answerable question and searching strategies were described in the first part of this review. In this part, appraising the retrieved article (with the main focus on the diag...

متن کامل

QUICK: Expressive and Flexible Search over Knowledge Bases and Text Collections

Recent work on Web-extracted data sets has produced an interesting new source of structured Web data. These data sets can be viewed as knowledge bases (KB) – large heterogeneous linked entity collections with millions of unique edge and node labels, often encoding rich semantic information over entities. For example, YAGO [5] and ExDB [2] have fact collections numbering in the tens and hundreds...

متن کامل

Schema Extraction for Tabular Data on the Web

Tabular data is an abundant source of information on the Web, but remains mostly isolated from the latter’s interconnections since tables lack links and computer-accessible descriptions of their structure. In other words, the schemas of these tables — attribute names, values, data types, etc. — are not explicitly stored as table metadata. Consequently, the structure that these tables contain is...

متن کامل

Uncovering the Relational Web

The World-Wide Web consists of a huge number of unstructured hypertext documents, but it also contains structured data in the form of HTML tables. Many of these tables contain both relational-style data and a small “schema” of labeled and typed columns, making each such table a small structured database. The WebTables project is an effort to extract and make use of the huge number of these stru...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Applying WebTables in Practice

نویسندگان

چکیده

منابع مشابه

Surveillance and monitoring: a vital investment for the changing burdens of disease.

Evidence based medicine in nuclear medicine practice; Part II: Appraising and applying the evidence

QUICK: Expressive and Flexible Search over Knowledge Bases and Text Collections

Schema Extraction for Tabular Data on the Web

Uncovering the Relational Web

عنوان ژورنال:

اشتراک گذاری